Workshop: Tweet"Data Science on Hadoop"
In this full day workshop on Data Science using Apache Hadoop, you will learn how to work with large data sets and extract meaningful information from them as well as applying machine learning models to build data driven functionality. You will work on a real world, substantially large data set on a full blown Hadoop cluster (running in the cloud).
We will start off with an introduction of the activities of a data scientist and some of the concepts that are involved. During the first part we will get hands-on with exploratory data analysis on a large data set using Apache Hadoop, Apache Spark and Python. In the second part we will create a full blown data science solution using a large data set and machine learning models.
This workshop focusses on getting hands-on with these subjects and not too much on theory.
Learning outcomes:
- Understand the Data Science process
- Basic use of some Data Science tools for Big (and smaller) Data
- Basic use of Apache Hadoop and Apache Spark
- Data visualisation for exploratory analysis
- Basic knowledge of machine learning models
Target Audience
Software engineers who want to get hands-on with data science. Coding skills are required. No prior knowledge of data science or machine learning is expected. Some experience in Python is helpful, but not a necessity.
Technical Requirements
You need a laptop that allows SSH access to a server and has a web browser. Additionally, a text editor can come in handy.
Workshop is limited to 20 attendees.